Instrumentation Run Dataparallel C Architectural Linearization
نویسندگان
چکیده
Recent advances in the power of parallel computers have made them attractive for solving large computational problems. Scalable parallel programs are particularly well suited to Massively Parallel Processing (MPP) machines since the number of computations can be increased to match the available number of processors. Performance tuning can be particularly dii-cult for these applications since it must often be performed with a smaller problem size than that targeted for eventual execution. This research develops a performance prediction methodology that addresses this problem through symbolic analysis of program source code. Algebraic manipulations can then be performed on the resulting analytical model to determine performance for scaled up applications on diierent hardware architectures.
منابع مشابه
Dynamic Instrumentation and Optimization for GPU Applications
Parallel architectures like GPUs are a tantalizing compute fabric for performance-hungry developers. While GPUs enable order-of-magnitude performance increases in many dataparallel application domains, writing efficient codes that can actually manifest those increases is a non-trivial endeavor, typically requiring developers to exercise specialized architectural features exposed directly in the...
متن کاملHP Caliper : A Framework for Performance Analysis
You perform statistical sampling by taking periodic snapshots of a program’s state. Statistical sampling is nonintrusive—unlike binary instrumentation, statistical sampling doesn’t add any lines of code to the application being tested—but the computing community generally regards this technique as imprecise. It imposes low overhead on a program’s runtime performance and can be used for time-cri...
متن کاملPerformance Analysis of Large-Scale OpenMP and Hybrid MPI/OpenMP Applications with Vampir NG
This paper presents a tool setup for comprehensive eventbased performance analysis of large-scale openmp and hybrid openmp/ mpi applications. The kojak framework is used for portable code instrumentation and automatic analysis while the new VampirNG infrastructure serves as generic visualization engine for both openmp and mpi performance properties. The tools share the same data base which enab...
متن کاملImplementation of the Parallel Superposition in Bulk-Synchronous Parallel ML
Bulk-Synchronous Parallel ML (BSML) is a functional dataparallel language to code Bulk-Synchronous Parallel (BSP) algorithms. It allows an estimation of execution time, avoids deadlocks and nondeterminism. This paper presents the implementation of a new primitive for BSML which can express divide-and-conquer algorithms.
متن کاملFundamental issues in designing data - parallel data ow computers
This paper analyses the fundamental primitives in the data-parallel computational model and proposes architectural solutions to these within the framework of a dataparallel data ow computer. The collective behaviour of this paradigm enables the use of a novel caching mechanism to be used in conjunction with an ETS matching store. It is also shown how collective behaviour may be exploited in opt...
متن کامل